How many eigenvalues of a Gaussian random matrix are positive? 
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We study the probability distribution of the index J^^, i.e., the number of positive eigenvalues of an A'' x TV 
Gaussian random matrix. We show analytically that, for large and large with the fraction < c = 
^+ < 1 of positive eigenvalues fixed, the index distribution J'(!N+ = cN, N) ~ exp [-/3iV2<E>(c)] where 
P is the Dyson index characterizing the Gaussian ensemble. The associated large deviation rate function $(c) 
is computed explicitly for all < c < 1. It is independent of /3 and displays a quadratic form modulated by a 
logarithmic singularity around c — 1/2. As a consequence, the distribution of the index has a Gaussian form 
near the peak, but with a variance A{N) of index fluctuations growing as A(iV) ~ log N/I3n^ for large iV. For 
13 — 2, this result is independently confirmed against an exact finite N formula, yielding A (A'') — log N/2-k^ + 
C + 0{N~'^) for large N, where the constant C has the nontrivial value C = (7 + 1 + 31og2)/27r^ ~ 
0.185248... and 7 = 0.5772... is the Euler constant. We also determine for large A*" the probability that the 
interval [("1, ("2] is free of eigenvalues. Part of these results have been announced in a recent letter [Phys. Rev. 
Lett. 103, 220603 (2009)]. 

PACS numbers: 02.50.-r: 02.10.Yn; 24.60.-k 

Keywords: Gaussian random matrices, large deviations, Coulomb gas method, index 



I. INTRODUCTION 



Statistical properties of eigenvalues of random matrices have been extensively studied for decades, stemming from the sem- 
inal work of Wigner [1]. Random Matrix Theory (RMT) has successfully provided tools and methods to disparate areas of 
physics and mathematics [''], with countless applications so far. Statistics of several random variables associated with random 
eigenvalues have been studied extensively. This includes the length of a gap in the eigenvalue spectra, number of eigenvalues in 
a given interval, the largest eigenvalue, the trace etc. [2]. Most studies concerned with the probability of typical fluctuation of 
such a random variable around its mean. 

However, various recent applications of random matrix theory have posed questions regarding atypical large fluctuations 
of such random variables associated with the eigenvalues, thus triggering a number of recent studies on the large deviation 
probabilities of such random variables. This includes, for instance, the large deviation probability of the extreme (maximum and 
minimum) eigenvalues of Gaussian [3-7] and Wishart random matrices [4, 8, 9], of the number of stationary points of random 
Gaussian landscapes [10, 1 1], of the distribution of free energies in mean-field spin glass models [12, 13], of the conductance 
and shot noise power in chaotic mesoscopic cavities [14, Id], of the entanglement entropy of a pure random state of a bipartite 
quantum system [16-19] and of the mutual information in multiple input multiple output (MIMO) channels [20]. In addition, 
random matrix theory has been used to understand large deviation properties of various observables in the so called vicious 
walker (or nonintersecting Brownian motion) problem [2 1 -24] . The purpose of the present paper is to provide a detailed analysis 
of the large deviation properties of another natural random variable for large Gaussian matrices, namely the fraction c of positive 
eigenvalues of an x Gaussian matrix. Part of the main results presented here were announced in a recent Letter [2^]. We 
will explain shortly why this fraction c is a natural observable that arises in a number of physical situations. But before we do 
that, it is useful to recall some well-known facts about Gaussian matrices. 

There are three families of Gaussian random matrices with real spectrum: orthogonal (GOE), unitary (GUE) and symplectic 
(GSE). The N x N matrices belonging to these families are real symmetric, complex hermitian and quaternion self-dual respec- 
tively, whose entries are independent Gaussian variables (real, complex or quaternions) labeled by the Dyson index /? = 1,2,4 
respectively. The probability distribution of the entries of a matrix 'M. is then given by the Gaussian weight: 

T(M)cxcxp(^-^(M,M)^ (1) 

where (M., 'M.) stands for the inner product on the space of matrices invariant under orthogonal, unitary and symplectic trans- 
formation respectively. Explicitly, one has: 



{M,M) = Tr(Af2), 


/3 = 1 


GOE 


(2) 


{M,M) = Tt{M*M), 


13 = 2 


GUE 


(3) 


{M,M) = Tr(Af^M), 


/3 = 4 


GSE 


(4) 
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where ★ denotes hermitian conjugation and f the quaternion self-dual. The celebrated result by Wigner states that for large 
matrix size N, the average density of eigenvalues (all real) for such ensembles has a /3-independent semicircular form [1,2] 



which vanishes identically at the two edges ±\^2N and is normalized to unity. Clearly, the mean spacing between eigenvalues 
in the bulk, i.e., close to the origin, behaves for large N SiS 6n = l/(^Psc(0)) = 7r/-\/2iV. 

A natural and much studied question that goes back to Dyson [26] is: how many eigenvalues are there in a given interval 
[a, b] on the real line? Clearly this number 3\f[a is a random variable that fluctuates from one sample to another Its mean 
value, for large N, is easy to compute by integrating the semi-circular average density in (5) over the interval [a, b]: (^[a,b]) = 

/a Psc(A, N)dX. But how does this number fluctuate from one sample to another? Dyson studied this number fluctuation in 



the so called bulk limit, i.e., he focused on a small symmetric interval around the origin [—6nL/2, SnL/2] where Sjy = tt/\/2N 
is the mean bulk spacing and L is kept fixed while one takes the N oo limit. Let denote the number of eigenvalues in this 
interval. Clearly, the mean number of eigenvalues {Nl) ~ L. But Dyson also computed the variance of in the large N limit 
(with L fixed) and showed that for large L the variance grows logarithmically with L 



and the constant Bp was computed by Dyson and Mehta [27]. Thus the typical fluctuations of grow as y^IogX for large L. 
More recently, even the higher moments of J^l (in the N oo limit with L fixed) were computed which proved that on a scale 
of y/\og L around the mean L, the random variable J^l has a Gaussian distribution [28, 29]. 

Here our focus will be on a different limit, namely we study the statistics of the number of eigenvalues, not on a small 
symmetric interval around the origin (i.e, the bulk limit), but rather on the full unbounded interval [0, oo] . In other words, we are 
interested simply in the distribution of the number of positive eigenvalues (called the index) of a Gaussian random matrix 
M. Since the average density of states is symmetric in A, it is clear that on average there are (3\f+) = positive eigenvalues. 
Clearly the index 3\f+ fluctuates from one realization of the matrix to another and in this paper, we are precisely interested in the 
fluctuation properties of the random variable ?^+, i.e., in the full probability distribution CP(^+, N). Evidently, < < N. 
Also, the number of negative eigenvalues ?sf_ ^ N — 3\r+ is distributed identically as the number of positive eigenvalues 3\f_|- by 
virtue of the Gaussian symmetry, indicating A^) = 7{N — 3\r_|_, A^). Hence the distribution 5'(3\f+, A^) of is clearly 

symmetric around its mean value (3Nf+) = A^/2. It thus suffices to study the range N/2 < 3\r+ < A^. 

So, why are we interested in this index distribution? This question naturally arises in the study of the stability patterns 
associated with a multidimensional potential landscape V{xi, X2, ■ ■ ■ , xjv) [30]. For instance, in the context of glassy systems, 
the point {xi} represents a configuration of the system and is just the energy of the configuration [31]. Similarly, in 

the context of disordered systems or spin glasses, may represent the free energy landscape. In the context of string 

theory, V may represent the potential associated with a moduli space [32]. Typically such an A^-dimensional landscape has 
many stationary points (minima, maxima and saddles) with complex stability patterns that play an important role both in statics 
and dynamics of such systems [ju]. The stability of a stationary point of this A^-dimensional landscape is decided by the A^ 
real eigenvalues of the (A^ x A^) Hessian matrix Mij = [d'^V/dxidxj] which is symmetric. If aU the eigenvalues are positive 
(negative), the stationary point is a local minimum (local maximum). If some, but not all, are positive then the stationary point 
is a saddle. The number of positive eigenvalues (the index), < 3^+ < N, is then a key object that determines in how many 
directions the stationary point is stable. Given a random potential V, the entries of the Hessian matrix at a stationary point 
are usually correlated. However, in many situations, important insights can be obtained by ignoring these correlations and just 
assuming the entries of the Hessian matrix are just independent Gaussian variables. This then leads to the study of the statistics 
of index for a GOE matrix. This toy model, called the random Hessian model (RHM), has been studied extensively in the context 
of disordered systems [j 1], landscape based string theory [33] and also in quantum cosmology [i4]. Although in RHM /3 = 1, 
it is quite natural to study the index distribution for other Gaussian ensembles, namely for GUE {(3 = 2) and GSE (/? = 4). 

For the GOE (/3 = 1), the statistics of J^^ was studied by Cavagna et al. [3 I] using supersymmetric replica methods and 
some additional approximations. They argued that around its mean value N /2, the random variable 3\f-|_ has typical fluctuations 
of O(vTogA') for large A^. Moreover, the distribution of these typical fluctuations is Gaussian. In other words, over a region of 
width -y/IoglV, the distribution for large A^ is given by [31] 




(5) 



{{'H^-Lf)^^\og{L) + Bp 



(6) 



TT' 



.2 



J'(K+, A^) w cxp 



(?^+ - NI2) 



2 



(7) 



21og(Ar) 



implying tiiat for /3 = 1, A(Ar) = {{1^+ - N/2)^) « log(Ar)/7r' 



^ for large N. 
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On the other hand, this Gaussian form does not describe the atypically large fluctuations of 3Nf+. For example, in the extreme 
limit when ]\f+ ^ N, the probability that all eigenvalues are positive = N, N) was computed recently for large N and 

for all /3 [3], 

= N,N)^ cxp [~/39N^] ; 9^^ log(3). (8) 

This question of the probability of extreme large fluctuation of ?\f+ (fluctuation on a scale ~ O(A^) around its mean N/2) 
naturally came up in several recent contexts such as in landscape based string theory [33], quantum cosmology [3 '] and in the 
distribution of the number of minima of a random polynomial [35]. 

These two rather different forms of the distribution 'J'{J{^,N) in the two limits, namely in the vicinity of J^^ = N/2 (over 
a scale of VToglV) (as in (7)) and when = N (as in (8)) raise an interesting question: what is the form of the distribution 
J'(N+,iV) for intermediate values of N/2 << 3\f+ < A^? In other words, how does one interpolate between the limits of 
typically small and atypically large fluctuations? To answer this question, it is natural to set 7^+ = cN where the intensive 
variable < c < 1 denotes the fraction of positive eigenvalues and study the large N limit of the distribution 'J'{cN, N) with 
c fixed. Again, due to the Gaussian symmetry, 'J'{cN,N) = J'((l — c)N,N) and it is sufficient to restrict c in the range 
1/2 < c < 1. 

In a recent Letter [25], we computed the large N limit of the distribution y{cN, N) in the full range < c < 1 for all > 
and showed that 

TicN, N) w cxp N^ $(c)] (9) 

where the rate function $(c) = $(1 — c), independent of /3, was computed explicitly for all 1/2 < c < l'. The fact that the 
logarithm of the probabihty is ^ 0{N'^) for fixed c is quite natural, as it represents the free energy of an associated Coulomb 
fluid of N charges (eigenvalues) (to be discussed in detail later). The Coulomb energy of N charges clearly scales as ^ 0{N'^). 
In the limit c — > 1, we get $(1) ~ 9 ~ log(3)/4 in agreement with (8). The distribution is thus highly non-Gaussian near its 
tails. In the opposite Umit c — > 1/2, we find a marginally quadratic behavior, modulated by a logarithmic singularity 

2 1og(c-l/2)- ^'^^ 



Setting c = ?\f_|- /N and substituting this form in (9), we find that in the vicinity of ~ N/2 and over a scale of \/\og N, 
indeed one recovers the Gaussian distribution 



y{yi+,N) sa cxp 



'21og(7V) 



{Ji+ - N/2f 



(11) 



thus proving that the variance A(A^) = ((>[+ - N/2Y) « \og{N) / Pt:'^ for large N and for ah (3. For /3 = 1, this perfectly 
agrees with the results of Cavagna et al. [31]. 

In addition to obtaining the full distribution J'(cA^, N) of the fraction of positive eigenvalues c, our Coulomb gas approach 
also provides a new method of finding solutions to singular integral equation with two disconnected supports, as discussed in 
detail later This method is rather general and can be fruitfully applied to other related problems in RMT, an example is later 
provided in the paper in calculating the probability that an interval [(^i, C2] is free of eigenvalues, i.e., there is a gap [^1, C2] in 
the spectrum. The details of these calculations are somewhat involved and were not presented in our previous Letter [25]. The 
purpose of this paper is to provide these details which we believe will be important for other problems as well. 

The paper is organized as follows. In Section II. A we set up the problem and show that the rate function can be computed 
via the solution of a singular integral equation on a disconnected support. In subsections II. B and II. C, we provide two different 
strategies to find such a solution, the first based on a scalar Riemann-Hilbert ansatz and the second based on an iterated applica- 
tion of a theorem by Tricomi. In subsection II. D we derive the free energy of the associated Coulomb gas and the large deviation 
function $(c) associated with the index distribution. In subsection II. E we provide an asymptotic analysis of $(c) near c = 1/2 
and determine the variance of the index for large matrix size N . In section III we provide details of numerical simulations. As 
an application of the general method for solving two-support integral equation, we compute in section IV, the probability that a 
Gaussian random matrix has a gap [Ci , C2] in the spectrum. In section V we offer a derivation of a determinantal formula for the 
variance of the index at finite N for /3 = 2. Finally, we conclude with a summary in section VI. 



Hereafter, the notation stands for the precise asymptotic law limjv-»oo — = ^(c)- 
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II. THE PROBABILITY DISTRIBUTION OF THE INDEX 



A. Setting and Notation 

We consider the standard Gaussian ensembles of random matrices with Dyson index /3 = 1,2,4, corresponding to real, 
complex and quaternion entries respectively. The probability distribution of the entries is given in (1) and consequently the joint 
probability density of eigenvalues reads [2] 

^(Ai, . . . , Aw) = -^e-^ ^" 1 TT |A, - Afcl^ (12) 

where is the normalization constant which can be explicitly computed via a Selberg-like integral [2] and to leading order for 
large N, Zn « exp{-/3noN^) where Qo = (3 + 21og2)/8 [3]. 

We wish to compute the probability distribution J'(3Nr+, N) of the index defined as the number of positive eigenvalues of 
the iV X matrix M: 

N 



?^+=^0(AO (13) 



i=l 



By definition: 



7(-oo,oo)« V '<k \ tl J 

We will set 3\f+ = cN where < c < 1 is the fraction of positive eigenvalues. As mentioned in the introduction, due to the 
Gaussian symmetry, the number of positive eigenvalues 3\f+ will have the same distribution as the number of negative eigenvalues 
= iV - 3\f+. Hence, ^(ciV, iV) = T ((1 - c)N, N) (the distribution is symmetric around c = 1/2). Thus, it is sufficient to 
focus only on the range 1/2 < c < 1. 

The evaluation of the iV-fold integral (14) in the large N limit consists of the following steps: first, we write the integrand 
(ignoring the delta function) as, exp [~l3E{{Xi})] with E{{X,}) = -(1/2) J2j^k log - Afc| + (1/2) J2t A?- Written in this 
form, the integral has a natural interpretation as the partition function of a Coulomb gas in equilibrium at inverse temperature /3. 
We can identify A/s as the coordinates of the charges of a 2-d fluid confined on the real axis. The charges repel each other via the 
2-d logarithmic Coulomb potential and are confined by a quadratic external potential. Then E is the energy of this Coulomb gas. 
Furthermore, the Coulomb energy scales, for large N, as ~ 0{N^) (since it involves pairwise interaction between N charges). 
In contrast, the external potential energy scales as ^ A^yp where Atyp is a typical eigenvalue. Balancing the two energy scales, 

one finds that a typical eigenvalue scales as Atyp ^ \/iV for large A^. 

The next step is to evaluate this partition function of the Coulomb gas in the large N limit via the saddle point method. 
In the large N limit, the eigenvalues become rather dense and one can then take a continuum limit where one replaces the 
integration over the discrete eigenvalues by a functional integral over the density of these eigenvalues. Originally introduced by 
Dyson [26], this procedure (see also [36]) has recently been successfully used in a number of different contexts. These include 
the computation of the extreme eigenvalue distribution of Gaussian [3, 4] and Wishart random matrices [ , , '], counting the 
number of stationary points of random Gaussian landscapes [10, 11], and computing the distribution of the bipartite quantum 
entanglement [16-18]. In addition, this method has also been used recently in systems such as nonintersecting fluctuating 
interfaces in presence of a substrate [22], in computing the distribution of conductance and shot noise power in mesoscopic 
cavities [14, 15] and in the study of multiple input multiple output (MIMO) channels • ]. 

Dyson's prescription requires first a coarse-graining procedure, where one sums over (partial tracing) all microscopic configu- 
rations of Ai's compatible with a fixed charge density function £»Af (A) = J^i (^(A — A;). Secondly, one performs a functional 
integral over all possible positive charge densities g7v(A) normalized to unity. Finally the functional integral is carried out in the 
large N limit by the saddle point method. 

Following this prescription, we introduce a continuum fluid representation for the Coulomb cloud of eigenvalues with density 
ew(A) = N^-^ Sill '^(A — Ai). Since Atyp ~ VN, it follows that the normalized density should have the scaling form 
f?7v(A) = N^^/'^ fc{\/ \/N) for large N. The scaled density fdx) satisfies the obvious normalization conditions: 



dxfcix) = 1 (15) 
dx0{x)fc{x) = c (16) 
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where we have set 3^+ = cN with 1/2 < c < 1 being the fraction of positive eigenvalues. The probability density (14) can then 
be rewritten as a functional integral over fc{x) as: 



y{'N+ = cN, N) 



Zc{N) 



(17) 



where the numerator Zc{N) reads: 

a)[/,(a:)]exp -|iV2 / dxx^Ux)- / dxdx'fc{x)fc{x')\og\x~x'\ + 



dx9{x)fc{x) - c 



A. 



dxfc{x) - I 



(18) 



where Ai,A2 are Lagrange multipliers enforcing the normalization conditions (15) and (16). 
We define the action S[fc{x)] as: 



S[fc{x)] 



dxx^ fc{xy 



OO /'C30 



dxdx' fc{x)fc{x') log |a;— x'l+Ai 



— OO — OO 



dxe{x)f,{x)-c]+A2 



dxfc{x) - 1 
(19) 

Evaluating (18) by the method of steepest descent and using the large N asymptotics of the denominator Zm in (17) gives, to 
leading order for large N, 



Z,(iV)«exp (^-^N^S[f*{x)] 
where f2o = (3 + 2 log2)/8 [3] and f*{x) is the solution of the saddle point equation 



SSjUx)] 2 

^ " — If — ^ 



/OO 
dx'f*{x')\og\x-x'\ 
-OO 



(20) 
(21) 

(22) 



The function f*{x) can be interpreted as the equilibrium (or optimal) charge density of the eigenvalue fluid, given a fixed fraction 
c of positive charges. Once we obtain the solution f* {x) of the integral equation (22), we can evaluate the saddle point action in 
(20), and together with (21) one then gets the index distribution 



VicN, N) 



Zc{N) 
Zn 



( 



exp 



-s[!t{x)] - no 



V 



(23) 



J 



where <i>(c) is the large deviation function. 

Thus aU we have to do is to solve the saddle point equation (22) for a fixed 1/2 < c < 1. To avoid the Lagrange multipUers, 
it is convenient to differentiate (22) with respect to x and for (x ^ 0), one gets the integral equation 



X = Pr 



dx' 



, X ~ x 

(where Pr denotes Cauchy's principal value), supplemented with the constraints: 

dxf*{x) = 1 
dxf*{x) = c 



(24) 

(25) 
(26) 



Singular integral equations of this type have been studied by Tricomi [-''], who derived an explicit formula for the solution f*{x) 
in the case when the solution is nonzero over a single finite connected support x e [Li, L2] where Li and L2 are respectively 
the lower and the upper end of the support. Tricomi's theorem states that the general solution f{x) to singular integral equations 
of the form 



Pr 



fix') 



dx' 



(27) 
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over the interval [Li, 1/2] with Li < L2 (where the source function g{x) is given and arbitrary) is [37]: 



7I'^V(-^2 - X){x - Li) 



^{L2~x'){x'-L,) ,,,,,, ^ 
Pr / — — — g[x )ax + Bi 



x 



X' 



(28) 



where Bi = — tt jj^^ f{x)dx is a constant. 

Let us then first assume that indeed the solution f*{x) of (24), with the source function g{x) = x, has a single support over 
[Li,L2]- Substituting g{x) = x, one can evaluate the integral in (28) explicitly to obtain 



1 



8TT^iL2-x){x-Li) 



[{L2 - Lif + 4(L2 + Li)x - 8x^ + 8] 



(29) 



where we have used the normahzation condition Jj^^ f*{x)dx = 1 to set the constant Bi = — tt. There are two unknown 
constants Li, L2 which are to be fixed from the constraint (26) and the consistency condition that the solution f*{x) (which 
represents a density) must be non-negative over [Li, L2\. At the two endpoints Li and L2, the solution either vanishes or has an 
inverse square root divergence (which is integrable). If we try to evaluate these constants, it is easy to check that a non-negative 
consistent solution is possible only for two limiting values of c, namely c = 1/2 and c~l. Let us discuss these two cases first. 

The case c = 1/2: In this case, the solution must be symmetric which indicates Li = — ^2- In addition, it is clear physically 
that the solution must vanish at the endpoints Li and L2. This fixes L2 = —Li = and the solution in (29) reduces to the 
Wigner semicircle law, namely 



(30) 



This is reassuring and is expected for the following reason: if there was no constraint at all on the fraction of positive eigenvalues, 
the system would naturally choose to have half the eigenvalues positive and half negative on average, implying (?Nf+) = N/2, 
and the equihbrium charge density would be the standard Wigner' s semicircle law. 

The case c = 1: In the other extreme limit c ~ 1 where all the eigenvalues are forced to be positive, one can again find a 
consistent solution from (29) that satisfies all the constraints and is given by 



fiix) 



1 

2^ 



L — X 



[L + 2x] 



(31) 



where L = 2-^/2/3. In this case, the support is over [0, L] with Li = 0, L2 = L. Note that this solution vanishes at the upper 
edge X = L and diverges as at the lower edge a; = 0. This explicit solution was first obtained in [3]. 

It turns out that for other values of 1/2 < c < 1, there is no single support solution (29) that satisfies the constraint (26) and 
is non-negative for all x £ [Li, £2]- To see what is going wrong, it was instructive to perform numerical simulation (the details 
of which will be described later) for 1/2 < c < 1. For example, for c = 0.6, the optimal density is given in Fig. (1). It is evident 
from the figure that for c = 0.6, indeed there are two disconnected supports of the optimal charge density f*{x). 

A similar feature actually holds for alll/2<c<l. Asc— >1 from below, the area under the left support vanishes and 
we are left with a single support over [0, L] as in (31). On the other hand, as c decreases continuously, the area under the left 
support grows and the upper edge of the left support (always on the negative side) also increases. Finally when c hits 1/2, the 
two supports merge into a single support, symmetric about the origin, and reduces to the Wigner semicircle law (see Fig. (2)). 

Hence, it is not surprising that we cannot obtain any consistent single support solution using Tricomi's result in (29) for 
1/2 < c < 1, as the optimal density does not have a single support but rather two disconnected supports. The technical reason 
for the two-support solution can indeed be traced back to the jump discontinuity at .t = due to the Heaviside theta function in 
the saddle point equation (22). So, the main technical challenge is how to obtain analytically an explicit two-support solution of 
the integral equation (24) for all 1/2 < c < 1, given that we cannot use the Tricomi solution any more. This is an interesting 
mathematical challenge since such two-support solutions appear in other problems as well and a general method would be very 
useful. This is what we achieve here as detailed in the next two subsections. In fact we will present two different approaches 
producing the same results. But before we get into the technical details of the two methods, it may be useful to summarize here 
the main result. 

We show that the solution of (24) satisfying the constraints (25) and (26) and the condition of non-negativity, for all 1/2 < 
c < 1 is given by 



L{a) 



ax 



V(aa; + L{a)){x + (1 - l/a)L{a)) 



(32) 
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FIG. 1 : Analytical density /* (a;) in (32) for c = 0.6 (solid black) together with results from i) (red) numerical diagonalization of 10® matrices 
of size 20 X 20, where only samples having 12 positive eigenvalues were retained for the statistics (c = 0.6), and ii) (blue) Montecarlo 
simulations of the Coulomb fluid with A'^ = 50 particles. 
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FIG. 2: The optimal density of eigenvalues f*{x) (Eq. (32)) for c = 1/2 (red), 3/4 (green) and 0.995 (blue). 



where 



Lia) 



aV2 



Va^ - a + 1 

and the parameter a is determined implicitly as a function of c from (26) by the condition: 

r-l 



dij 

' \ y 



a — 1 TT f a — 1 



For general c, the equilibrium density (32) has support on the union of two disconnected intervals 

[-L{a)/a, -(1 - l/a)L{a)] U (0,i(a)]. 



(33) 



(34) 



(35) 



One can easily check that in the two limiting cases c = 1/2 and c = 1, our general solution reduces respectively to (30) and 
(31). 

• c = 1: this corresponds to having no negative eigenvalues at all, thus the equilibrium density must match the solution in [3] 
at z = 0. This is achieved as long as a 2 and thus L{a) —J- ^/sJ3 as expected (compare to [ ]). Then the blob of negative 
eigenvalues in (32) (see (35)) collapses to a single point and vanishes. 
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• c = 1/2: this case represents the usual Wigner's semicircle and is recovered from (32) when a ^ 1 and consequently 
L{a) — > \/2. In this case, the support (35) becomes compact [— v^, v^] as it should. 

In the next two subsections, we provide two alternative derivations of (32), the first one based on a scalar Riemann-Hilbert 
ansatz and the second one based on an iterated appUcation of Tricomi's single support solution. 



B. Method I: proof of (32) via Riemann-Hilbert ansatz 

In the context of counting of planar diagrams, Brezin et. al. [38] encountered singular integral equation of the type (27) with 
a single support solution. They did not use the explicit Tricomi solution, but instead developed an alternative method using a 
scalar Riemann-Hilbert ansatz. This method makes use of properties of analytic functions in the complex plane. Even though the 
method requires making a guess or ansatz (verified a posteriori), it turns out to be rather useful. This method can be generalized 
in a straightforward manner to the case when the solution has multiple disconnected supports and has been used before in other 
contexts [an example in a specific case can be found in the appendix of [15], see also [39]]. Let us illustrate below the main 
idea behind this method. 

Let us consider the singular integral equation 

g{x) = Pr / dx'^^ (36) 
Js x~ x' 

where the solution f{x) has support on the union of a finite number of intervals § = Ufclil'^fci /^fc] the real line and is 
normalized to unity: f{x)dx = 1. The next step is to define a complex function F{z) (without the principal part) 



-oo 

/•oo 



F{z) = / dx'^ (37) 

Z — X 



in the complex plane. The function F{z) has the following properties: 

1. it is analytic everywhere in the complex z plane outside the cuts S = Ufcli [o^fcj /^fc] on the real line 

2. it behaves as 1/z when \z\ oo since j f{x')dx' — 1 due to the normalization. 



3. it is real for z real outside the cuts S = Ufcli Wk,Pk] 

4. as one approaches to any point x on the cuts S = Ufcli [o^fc, /3fc] on the real axis, F{x ± ie) g{x) =F iT:f{x). This is a 
consequence of (36). Thus, f{x) = —^lni[F{x + ie)]. 

The general theory of analytic functions in the complex plane tells us that there is a unique function F{z) which satisfies all 
the four properties mentioned above. Thus, if one can make a good guess or ansatz for the function F{z) and verifies that it 
satisfies all the above properties, then this F{z) is unique. Knowing F{z), one can then read off the solution f{x) using the 4-th 
property mentioned above. 

In our case, g{x) = x, f{x) — f*{x) and from the simulation results we already know that there are only two supports for 
1/2 < c < 1, one on the positive side and one on the negative side. To make a good guess for F{z), let us reexamine the precise 
form of the solution in the two limiting cases c = 1/2 and c = 1 



fi/2{x) cx ^2 — x^, Wigner's semicircle (38) 

/* (x) cx y^nZZpx + L2] , DM [3] (39) 

with L2 = •\/8/3. For intermediate values of c, we then seek a sensible two-support ansatz that interpolates between (38) and 
(39). A suitable ansatz, that is verified a posteriori, is 



1 L 



f*{x) = —=\ ^{ax + L){x + hL) (40) 

Tr-^a V X 

which has support over x E [—L/a, —bL] U [0,L]. The unknown parameters a, b, L depend on c in such a way that for c ^ 1/2, 
a — > 1, 6 — > 0, L — > \/2 and for c — > 1, a — > 2, — > 1/2, L — > \/8/3. We can then make the following guess for the function 
F{z), valid everywhere in the complex plane z, except on the cuts x E [—L/a, —bL] U [0, L] on the real axis 



L 



F{z)^z-\ yJ{z + L/a){z + bL). (41) 
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It is easy to check that the definition (41) indeed satisfies all the four properties mentioned above and hence is unique. From the 
4-th property mentioned above, namely taking the limit z ^ x + ie with x G [—L/a, —bL] U [0, L], it follows that that f*{x) is 
indeed given by (40). 

To fix the parameters a, b and L, we will use the 2nd property of F{z) mentioned above, namely that as j^l oo, F{z) ~ 1/z. 
Expanding F{z) in (41) for large z we get 



1 ^ 







L 




az 


li 











1/2 



1 



bL 

z 

D{a,b, L) 



1/2 



+ 0{z-^) 



(42) 



where 



D{a,b, L) 



I7_ 
8a2 



[l + a(2-25 + a(l + fe)2)] 



(43) 



Imposing the exact asymptotic decay F{z) '^1/ z for large |z|, we immediately get the two conditions 

--1+6=0 
a 

D{a,b,L) = 1 



(44) 
(45) 



which leads to 



6 = 1-- 
a 



L = L(a) 



iV2 



\/a^ - a + 1 



(46) 
(47) 



as stated in (32). Thus, we are left with only one unknown parameter a. This is fixed from the normalization condition 
Jo^"^^ fc{^)dx = c leading to (34) which determines a implicitly as a function c. 



C. Method II: proof of (32) via double iteration of the Tricomi solution 

While the method (I) presented in the previous subsection, for finding the solution with two disconnected supports of the 
integral equation (36) with g{x) = x, is rather elegant it has the drawback that one has to make a judicious guess for the function 
F{z). It is thus desirable to find a method where one does not need to guess. We show in this subsection that indeed it is possible 
to obtain an explicit two-support solution to (36) without making an a priori guess. The main idea behind this new method (II) 
is to actually use the Tricomi single support solution twice. Let us first outline below the basic principle behind this idea which 
turns out to be rather general and works for arbitrary source function g{x) in (36). 

We consider again the integral equation 

g(x)=FT f dx'^^ (48) 



JS X - X 

where f*{x) is assumed to have nonzero solution over two connected components S = [/i, ^2] U [Li, L2], with ^i < ^2 < < 
Li < L2- Note that the equation (48) holds for x G [h,l2] and also for x G [Li,L2]. Let us write the solution as 

= I {IH ^tti 1 (49) 

\ fc{x) for xe[Li,L2] 
Then (48) can be divided into two parts (respectively for the left and the right supports) and rewritten as 

f'^ f^(x') f^^ P(x') 

g{x)^ dx'^-^^+Pv dx'-^^^, forxG [Li,L2] (50) 
Jh x-x' Jl^ x~ x' 

g{x)=PT dx'i^^+ dx'i^^, for x G [Zi, W (51) 
Jh 2; -a;' x-x' 
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Note that for x € [Li, L2], the integral over [li, I2] becomes an ordinary integral (as there is no pole and we can drop the Pr) 
and similarly for the other side. 

The main idea then is to eliminate say f^{x) from these two equations and obtain a single integral equation for fl{x). This is 
carried out in the following way. For x E [Li, L2], (50) can be rewritten as 



g{x) = g{x) 



dx 



(52) 



The solution /^(x) has a single support over [Li, L2]. Hence we can now use the explicit Tricomi solution (28) (replacing g{x) 
in (28) by the new effective source function g{x)) to express (x) (for x G [Li, L2]) as a functional of (y) where y £ [li, I2] ■ 
Next, we use this explicit solution for f^x) in the second equation (51) and thus obtain a single integral equation involving 
fl{x). It turns out that for arbitrary g{x), this integral equation for fl{x) can be recast, with a suitable multiplicative factor, 
in the same form as (27) and since fl{x) has only a single support over [/i, I2], one can again use the Tricomi solution (28) 
to explicitly obtain Jl{x). This is the general programme. Below we show how the steps actually work out. Even though the 
method is quite general and works for arbitrary g{x), let us focus below on our specific case g{x) = a; just for simplicity. 
Our basic saddle point equation reads 



dx 



where the density f*{x) must also satisfy the two constraints (25) and (26): 



dxf*{x) = 1 and 



dxf*{x) 



(53) 



(54) 



The solution f*{x) is expected to have support over two disconnected components S = [Zi, ^2] U [Li, with /i < ^2 < < 
Li < L2- For consistency, we expect = = f*{L2). We also expect f*{Li) = if Li > (or otherwise Li = with 

no constraint on f*{Li)), and similarly fcih) = if ^2 < 0. We divide f*{x) into two parts as in (49). The constraints thus 
become: 



dxfl{x)+ I dxfl{x)^l and 



fc{x)=c (55) 
We then apply Tricomi's theorem (28) to (52) with g{x) ~ xio determine (y) on the interval y G [£1,^2] and obtain 



Ti'^^y - LiV^2 - y 

1 



TTC + Pr 



du 



\Ju — L\\JL2 — 



'Li u-y 
(L2-Li)2 + 4(Li + L2)y 



t — u 



TT^/y - Li^L2 - y 
where we have used the following result: 



dtfl{t) 



^/Ll -WL2-t 

t-y 



Pr 



du 



Li 



and 



As explained above, we expect /* {L2 

L\ + 2L1L2 



{u-y){t-u) 

dxfl{x) = l- 
0. Thus 



= TT 1 



yxi - WL2 - 1 
t-y 



dxJl{x) 



1 
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dtfl{t) 



yxi - - 1 

t-Lo 



0. 



(56) 



(57) 



(58) 



(59) 



Multiplying fl{y) by ir^J {y — Li)(i2 — y) in (56) and subtracting (59) from it gives a rather compact expression 



fAy) 




y 



dt- 



flit) [lT^ 

't-y \ L2-t 



for y e [Li,i2] 



(60) 
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Next we substitute this expression of /| (x) in the saddle point equation (51) valid over the left support [li , I2] (with g{x) = x). 
The resulting integrals can be carried out explicitly. We need to use the following integral 



L2 



dy L2-V 



:i 2; - y y y - Li 
valid for x < Li < £2- After a few steps of algebra we get 



= 1 - 



L2 - X 
Li — X 



(61) 



a; — Pr 



dt- 

X — t 

L1+L2 



IL2 


— X 




— X 



dy = 

\/Li- x\JL2 - X - Pi 



dm-v. 

X — t 



t L2 — X 



t — X y Lo — t \ L 



Cancellations of terms from both sides then lead us to the following integral equation for fl [x) for x E [li, I2] 



Pr 



dtlM.flEl^h^.,. 



t — X y L2 — t 



(62) 



(63) 



Defining fl{x) = fl{x)J ^^tif , we get an integral equation over [li, Z2] 



Pr 



dt 



m L1-L2 



t-x 



(64) 



which, fortunately, has the same form as the original single-support saddle point equation (27) with the source function g{x) 
(Li — ^2)72 — X. This can be inverted exphcitly using (28). Enforcing the constraint fl{li) = 0, we get 



{L2 - Li){l2 - k) , q-3ll + 2lil2 



fKx)dx + 

Using this in the Tricomi formula (28) finally gives us a rather expUcit solution 

Li — L2 Z2 ~ h 



= 



rlf . 1 X- h L2- X 

fc[X) = -\ \ J 

Try i2 — X \ Li — X 



for li < X < I2 



(65) 



(66) 



We can now replace fl{x) in the expression of f1{x) given in Eq. (60). Finally we get the expression of the density f*{x) 
(fci^) = fci^) on [luh] and /*(.t) = /^(x) on [Li,i2]): 



TT 



X — ll 

X 



Ll — X 



Ll — L2 , I2 ~ h 



for X e [I1J2] U [Li,L2] 



(67) 



So far we have used two physical conditions fl{li) — and f^{L2) — which are evidently manifest in the explicit solution 
(67). Substituting in (59) the expression of from Eq. (67), we get an identity for the edge points of the support 



1 + 



Ll + 2L1L2 — 3L2 I ( I2 ^ W 



h + 2L2 



2Li) 



(68) 



In addition, we have one more condition J^^^ f^{x)dx = c. Thus we have four unknowns li, I2, Li and L2 and two conditions 
mentioned above. To determine all the constants, we need to impose some additional conditions at the other two edges x = I2 
and X = Ll. With these conditions imposed, one obtains a unique solution for a given value of c as demonstrated below. 

It is clear we must have either Li = 0, or ii > (but with f*{Li) = 0). Similarly, we must also have either I2 = 0, or 
I2 < (with f*{l2) = 0). 



• First case: I2 = — Li. 

Eq. (68) gives 8 = 3L^ + 3lf + 2I1L2. Thus: 



1 



1+1-2 



rAx) - -^{x-li)iL2-x) 

TT 



for X e [Zi,0[U]0,L2] 
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The last constraint J^^ f*{x)dx = c implies that f*{x) is integrable in zero, thus li+ L2 = 0. Finally, using Eq. (68) we get 
L2 = —li = \/2and: 

f*{x) = -V2-a;2 for x G [-V2,V2] (69) 

TT 

and we recover the Wigner semicircle law, having a single support [— \/2, V^]- Note that in this case c = dxf*{x) — ^ 
already is fixed. Thus, this solution is valid only for c = 1/2. 

Second case: h < with ,f*{l2) = and Li = 0. 

In this case, the density has a support over [^i, ?2]U]0, L2]- We get: 



TT V X 

with L2 = -(/i + h) (because f^ih) 0) and 1 + + (^^) (3Zi + ^2 + 2L2) = (Eq. (68)). Let us define 
a = —L2/li- We readily obtain the claimed solution (32): 

L2 = L(a) = g^^^ ~ — ^ and I2 = —L9 ( 1 — (71) 

Va^ - a + 1 a V «/ 

As ^2 < and Zi < I2, we have: 1 < a < 2. Because of the last constraint J^^ f*{x)dx = c, the parameter a must also 
satisfy the following equation: 



1 — y 12' 1^^-'^ """(i a— 1 



in complete agreement with (34). 



• Third case: Li > with f*{Li) = and h = 0. 

This is the exact symmetric of the second case. It corresponds to c < 1/2. 

• Fourth case: ^2 < with ^{h) = and Li > with /*(Li) = 0. 

The constraints f*{l2) = and f*{Li) = give respectively L2 — Li = —{li + I2) and Li + L2 = I2 — h- Thus L2 = —li 
and Li = l2- As I2 < < Li, this case is impossible. 

In conclusion, there is only one unique solution (case 2 above) which is valid for all 1/2 < c < 1 and in the limiting case 
c = 1/2 this solution coincides with the first case above that is valid only for c = 1/2. 

D. Evaluation of the action and derivation of "l>(c) 

Having computed explicitly the saddle point solution f*{x) in Eqs. (32)-(34), the next step is to evaluate the saddle point 
action S[f*{x)] where the action S[fc{x)] is given in (19). This will then provide the expression for the large deviation function 
$(c) associated with the index distribution in (23) 

,M.is[/;wi-<i±fS<?»^ ,73, 

Upon substituting the saddle point solution in the action (19), one gets: 

s[f:ix)]^ x^f:{x)dx~ / f:{x)f:{x')\og\x-x'\dxdx'. (74) 



— oo J — oo 



By construction, the saddle point solution /* [x] automatically satisfies the two constraints and hence the terms involving the 
two Lagrange multipliers drop out in (19). One can directly substitute the explicit expression of f*{x) from (32) to evaluate the 
double integral in (74). However, this is a bit cumbersome. It turns out to be convenient to use a slightly different trick. Note 
that /* (x) satisfies the saddle point equation 

/oo 
f2{x)\og\x~ x'\dx (75) 
'OO 
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The important point is that this equation is valid for all x where the solution f{x) is nonzero, i.e., for all x G [li, I2] U [0, L{a)] 
where li — ~L{a)/a, I2 = — (1 — l/a)L{a) and L{a) is given in (33). 

To evaluate the action, we multiply (75) by f*{x) and integrate over all x. Using the two normalization conditions; (i) 

1^00 fc{x)dx = 1 and (ii) J^' f*{x)dx = c we get 



00 nOO 



00 J —00 



/:(x)/*(x')log|x-x'Ma;dx' = - 



x'^f*{x)dx + Aic + A2 



Substituting this result in the action (74) gives 



s[f:ix)] = - 



Denoting /X2 — f^{x)dx we get from (73) 



/oo 
x^f*{x)dx - Aic- Ai 
-OO 



'J'(c) = [(3/2 - + log(2) + A^c + A2] 



(76) 



(77) 



(78) 



It then remains to evaluate /i2 and the Lagrange multipliers A\ and A2. 

To determine the Lagrange multipliers we proceed as follows. Let us recall the function F{z) defined in (37) for all z in the 
complex plane except on the real cuts x e [— L(a)/a, — (1 — l/a)L(a)] U [0, i(a)]. Setting z — x real, but outside these two 
cuts, and L = L{a) we can make a large x expansion 



F{x)= [ l^dx' = T^ 
J x-x' ^ 



(79) 



where fj,n = J f*{x)x'^dx is the n-t\\ moment. From the explicit solution of f*{x) in Eq. (32) one can check that ^0 = 1 and 
also /i2 = 1/2 (independent of c). 

In addition, for real x > L,we have from Eq. (41) 



F{x) = X - 



L 

x + - 
a 



1--]L 
a 



(80) 



On the other hand for x < —L/a (on the real line to the left of the edge —L/a of the left support), the function F{x) has the 
form 



F{x) = X 



'(x-L) 



L 



1 - 



1 



(81) 



where the square-root is always chosen to be positive. Note that with this choice in (81), F{x) « l/a; for large negative x. 

To determine Ai and A2, we need to choose a value of x in (75) such that it belongs to either of the two supports. Choosing 
X = L and x = —L/a gives the following two equations 



L'+Ai+A2 = 2 

+ = 2 



f:{x')\og{L~x')dx' 
f*{x')\og{x' +L/a)dx' 



(82) 
(83) 



where the integral runs only over the supports. Writing log(i — x') ~ log(i) + log(l — x' /L), expanding the logarithm in a 
series and using the definition of we get from (82) 



L' + M+A2 = 2log{L)-2Y^^ 



(84) 



Similarly, in Eq. (83) we write log(a;' + L/a) ~ log{L/a) + log(l + ax' /L) and expand the logarithm in a series to get 



L^/a^ +A2 = 2 log{L/a) - 2 ^ 



(85) 



14 



We can then determine Ai and A2 in terms of by solving the two linear equations (84) and (85). It is actually convenient 
to express the series involving /i„ in terms of the following integrals. Using Eq. (79) and using /io = 1 we get, 



00 

F(x) - - = V 



X" 
n—1 



(86) 



Let us first consider the regime x > L. Here, let us define 



(.) = Fix) - ^ = - ^ - + ^) + (1 - ^) (B7) 

where we have used the definition of F{x) in Eq. (80). Integrating Eq. (86) over [L, 00] gives 

/ W,i.)d. = j:^ (88) 

Next we consider the regime x < —L/a. Here we use the definition of F{x) in Eq. (81). Integrating Eq. (86) over 
[—00, —L/a] gives 



-L/a 



Fix) - - 

x 



dx = -yt^ (89) 



n=l 



It is convenient to make a change of variable x — > —x on the l.h.s of Eq. (89). Using the definition of Fix) in Eq. (81) this 
finally gives 

rw,ix)dx^f:t^ (90) 

where W2 ix) is given by 



W2ix)^x-'--J^-^(x-^)(x-(l-l)L] (91) 



a; 



Next we insert the expressions of the two sums from Eqs. (88) and (90) in the two linear equations (84) and (85), solve for Ai 
and A2 and then substitute them in Eq. (78). This then yields the main result 

He) = l{L'~l~ log(2L2)] + log(a) - ' ' ^ I Jl Q ^'^"^^^^'^ ^^^^ 

where W^i(a;) and ^2(2;) are defined respectively in Eqs. (87) and (91). Unfortunately the two integrals are difficult to compute 
analytically. However, they can be easily evaluated by Mathematica. A plot of this function is provided in Fig. (3). This final 
form turns out to be the most convenient one for carrying out the asymptotic expansion near c = 1/2 in the next subsection. 

E. Asymptotic Expansion of "I>(c) near c = 1/2 

We now expand $(c) in Eq. (92) for c close to 1/2. We set c = 1/2 + S with 5 > being small. Let us also define the 
parameter e by 

(93, 

When c— > 1/2, a— > 1 from Eq. (34), hence e is a small parameter for c close to 1 /2. It follows from Eq. (93) that 
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where we have chosen the root that gives a — > 1 as e — ;> 0. It also follows from Eq. (33) that 

= r. (95) 

1 — e 

Let us first establish a relation between 5 and e when both are small. Eq. (34), in terms of e and 5, can be recast as 

(96) 



J{e) = j'^ dy Vy' + y + e = |(1 - e){l/2 + 6). 



Let us first analyze the integral on the l.h.s of Eq. (96). To find its asymptotic behaviorfor small e, we first note that J(0) = 7r/4. 
Next, taking a derivative with respect to e gives 



Ae)^U dyJ'-y (97) 
2 Jo y y ^Jy^ + 2/ + e 

Make a change of variable y = ez in the integral and take the limit e 0. To leading order in small e one easily finds 

J'(e) = -ilog(e) (98) 
Integrating and using J(0) ~ 7r/4 one then finds for small e 

J{^ = \-\e\og{e) + ... (99) 
Comparing the left and the right hand side of Eq. (96) then gives, to leading order in small e 

TT 

Inverting Eq. (100), one can express e as a function of 8 and to leading order for small (5 one gets 

7r5 



elog(e) (100) 



log(<5) 



(101) 



We are ready to expand <l?(c) in Eq. (92) for small 8 (or equivalently for small e). There are 5 terms on the right hand side of 
Eq. (92). We expand each of them separately. 
The first term gives, upon using Eq. (95) 

Ti = i - 1 - log(2L2)] = i[i _ log(4)] + if + + o(e3). (102) 
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The second term, upon using c = 1/2 + (5 and a from Eq. (94) and expanding for small e 



The third term gives 



The fourth term gives: 



T-- 



k = -^a-i/a^ 



^ ^.{l-l/a')L^ = --e + e5 + 0[e') 



Ta = — I dx 



= -[-1 + log(4)] + + ^ ^^^5 + ^e<5 + Q{^) 



Similarly, the fifth term gives: 



dx 



L/a 



X \ X \ a I \ \ a 



I, , TT-l (-l+l0g(4)) TT-l , 

= -[-1 + log(4)] - -^e - ^ ^^^5 + + 

Adding the five terms one gets, to leading order, 

$(c = 1/2 + (5) = Ti + T2 + Ta + ^4 + Ts = + ©(e') 

Using the expression of e as a function of 6 from Eq. (101) then gives our leading order result for small 6 

7r2 52 



$(c= 1/2 + (5) ~ -■ 



2 logd^ 



(103) 



(104) 



(105) 



(106) 



(107) 



(108) 



Substituting this result in Eq. (23) we then get, for c = 1/2 + (5 with 6 small (note that by symmetry one can similarly obtain 
the form of the function for 5 <{) also) 



J'((l/2 + (5)iV,iV) wcxp 



2 ]\t2 



'21og(|<5|) 

Resetting 5 = — N/2)/N and assuming (N^ — N/2) << N one gets the Gaussian distribution in the large N limit 



N) w cxp 



{J^+ - N/2Y 



21og(7V) 

from which one can read off the variance for large N and for all (3 

A(A^) = ( - ):^^log(7V) + 0(l) 

This result is in agreement with that of Cavagna et al. [3 1 ] for (3 = 1. 



(109) 



(110) 



(111) 



III. NUMERICAL SIMULATIONS 

In this section, we explain how to compute numerically the index distribution for a Gaussian random matrix ensemble and to 
compare the results with analytical predictions. The joint distribution of the N eigenvalues of a x iV Gaussian random matrix 
with Dyson index /? is given in Eq. (12) by: 



j<k 



Zn 



(112) 
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with E [{Xi}] = I J2i ^ '^i<j ^'-'S ~ ^'^^^ sample the distribution in Eq. (112) using a Metropolis Monte 

Carlo algorithm and to construct a histogram of the number of positive eigenvalues 3^+ = X^t^i ^(^i)- For large N, we expect 
the distribution of "N^ to be of the form (see Eq. (9)): 

y{J<+ = cN, N) - cxp [-;3iV2$(c)] (113) 

Therefore we want to construct a histogram of the rate function $num(c) = „ iogy(J^+-cAf,Af) compare with its analytical 
expression $(c) for large N given in Eq. (92). 

As 3\f+ is a discrete function of the N eigenvalues, it takes integer values between and N. Numerically it is easier to consider 
continuous functions and to come back to >f+ only at the end. Therefore we introduce a smoothed version of the Heaviside theta 
function ^^(A) and thus of Let us define for 77 > 0: 

1 ^ 
^r,(A) = — and X„ = ^0,,(A,) (114) 

The function 6*^ increases from (in the limit A —00) to 1 (in the limit A 00). It has the same symmetry with respect to 
the origin as the Heaviside theta function: 6',,(-A) = 1 - 6*,, (A). Thus we have ^(K,, = cN, N) = Til^jj = (1 - c)N, N). The 
parameter ij gives the width of the jump from to 1 and lim^_).o 0t]{X) = d{X), thus J^q — J^+. 



A. Distribution of 3sf^ : non-standard Metropolis algorithm 

In this section, we explain the Metropolis algorithm and a modified version that allows us to reconstruct numerically the full 
distribution of 1^,^ for a fixed and large enough value of 7]. 



Standard Metropolis algorithm 

We start with an initial configuration of the A^s (real numbers of order ^/N). At each step, a small move {Xt} — > {X[} is 
proposed in the configuration space. In our algorithm, it consists of picking at random an eigenvalue Xj and proposing to modify 
it as Xj — Xj + e, where e is a real number drawn from a Gaussian distribution with mean zero and with a variance that is set 
to achieve the standard average rejection rate 1 /2. 

The move is accepted with probability 

(e-^(B[{Aa]-B[{A.}]),l^ (115) 

and rejected with probability 1 — p. This dynamics enforces the detailed balance and ensures that at long times the algorithm 
reaches thermal equilibrium (at inverse "temperature" (3) with the correct Boltzmann weight e~^-^^^^'^\ 

At long times, the Metropolis algorithm thus generates samples of {Xi} drawn from the joint distribution in Eq. (112). We 
can start to keep the value of 'Nri — J2iLi ^-n {Xi) for the configurations of eigenvalues generated by the algorithm (say every ten 
steps) and construct a histogram for INf,,. 

However, the distribution of 3\f,, is expected to be of the form J'(IN'r, = c,, A^, N) ^ exp [— /3A^^<I>,j (c,,)] for large N exactly as 
for 3\r+, and thus to be highly peaked around its average. The events in the tails of the distribution are extremely rare. Therefore 
we can not, with a standard Metropolis algorithm, explore in a "reasonable" time a wide range of values of 3\f^. We propose 
below a modified version of the algorithm that allows us to explore the far left and right tails of the distribution. 



. y{x\,...,x'^) \ 
^^"^Ha'(A„...,A.)'V 



Modified algorithm: conditional probabilities 

We want to explore regions that are far from the mean value of ie far from (3\f,,) = A^/2 (by symmetry), for example the 
far right tail Jin = Nc,, with > 1/2. 

The idea is thus to force the algorithm to explore the region c,, > c* for different values of c* . We thus add in the algorithm 
the constraint > c*. More precisely, we start with an initial configuration that satisfies INfy, = CjjN > Nc* . At each step, the 
move is rejected if ?\f,, < Nc* . If > Nc* , then the move is accepted or rejected exactly with the same condition as before 
(see Eq. (1 15)). Because of the new constraint c,, > c*, the moves are rejected more often than before. Therefore the variance 
of the Gaussian distribution P(e) has to be taken smaller to achieve the standard rejection rate 1/2. 
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We run the program for several values of c* and we construct a histogram of J^jj for each value c* . This gives the conditional 
probability distribution P (3\f^|]^[^ > Nc*). Again, the algorithm can only explore a very small range of values of 3\f^. The 
difference with the previous algorithm is that we can now explore small regions of the form Nc* < 3Nf^ < Nc* + 5 for every c*, 
whereas we could before only explore the neighbourhood of the mean value N/2. 

The distribution of J^^j is given by 



y (N,,) ^ y {l^r,\y^^ > Nc*) T (N,, > Nc*) (for every K,, > Nc*). 
Therefore the rate function reads: 



loga'(:N'^ =C^N) _ l0gT(N^ = Cr,N\3^ri > Nc*) 



PN^ 



pm 



for Cri > c* 



(116) 



(117) 



where Kc* = — '"^^'•^^^^ is a constant (independent of c,,). In order to get rid of the constant Kc", we construct from the 

histogram giving P (3\f^ | IN",, > Nc* ) the derivative of the rate function. This derivative is equal to '^'^'jj'^'''' • The constant Kc* 
disappears. 

We can come back to $^(c,,) (and thus P (]sf,, = c,,iV) = e^^^ *>?(ci,)) from its derivative using an interpolation of the data 
for the derivative and a numerical integration of the interpolation. 
We typically run the algorithm for iV = 50 and 10^ iterations. 



$(C) , $num(c 



0.15 



0.10 



0.05 



0.00 L 



FIG. 4: Rate function $num (c) — — log J'(!N-|- = cN, N) / (/3A^^) plotted as a function of c for A'' = 50. The red points are numerical data 
obtained with the method explained in section III with rj = 0.5. Each point corresponds to an integer value of !N+ = cN. The blue solid line 
is the analytical prediction <I>(c) given in Eq. (92) for the large A'^ limit. 




B. Back to N 



+ 



Using the algorithm explained in the previous subsection, we get the distribution of [IsT,, for a given value of rj. A natural way 
of recovering the distribution of 3\f_|- would be to run the algorithm for smaller and smaller values of rj as "N^ = lim,,^o ^j;- 
However, as explained above this is not an efficient method numerically. For small 7], the distribution of X,, is indeed not smooth; 
3\f+ = INfo even takes integer values, i.e. it is discontinuous. 

A better procedure consists in running the algorithm for a fixed (and not too small) value of 77, typically ?/ = 0.5 for N — 50, 
and exploiting the joint data that we can get for IN",, and 1^+. When running the algorithm, we can indeed construct a joint 
histogram for 3\f_|- and ?\f,j (by keeping the value of J^^ and 3\f,j every ten steps). With all the data for many values of the 
constraint c* and after having filled up the histogram by symmetry around 1/2, we can then get a full histogram for J'(?sf+ |3\f,,). 

Finally we recover the distribution of ?\f_|_ by numerical integration over ?\f,, : 



(118) 
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In Fig. 4, we plot the rate function $1^111(2) = — ^ ^]v^^ ' — obtained numerically with the method explained above and 
compare with its analytical expression (E>(c) for large N given in Eq. (92). The agreement is quite good. As the distribution of 
3\f_(. (and 'Njj to a lesser extent) is not smooth for finite N, there are finite size effects and the convergence is a bit slow in the 
simulations. Therefore the agreement between numerics and the theory is less good very far from the mean value. 



IV. PROBABILITY OF A GAP [Ci , C2] IN THE SPECTRUM 

As an apphcation of the general result derived on the two-support solution in Section 11, here we address the natural question: 
what is the probability that there are no eigenvalues on the interval [(i, (2] (where (i < ^2) for a Gaussian random matrix? As 
discussed earlier, the natural scale for the eigenvalues of Gaussian random matrix is \fN for large N. Hence, it is appropriate 
to rescale Ci — wi\/lV and C2 = W2VN and denote this gap probability as P{'Wi,W2, N) with wi < W2- 

The computation of P{wi, W2, N) is performed in two steps. First we fix the number of eigenvalues that are bigger than W2 
to be N+ ~ cN where c denotes the fraction. Naturally the number of eigenvalues that are less than wi is then N- = (1 — c)N. 
Let P{wi ,W2,c, N) denote the gap probability for a given fixed c. Then the full gap probabiUty is obtaining by summing over 
all possible values of c 

P{wuW2,N)^ [ dcP{wi,W2,c,N) (119) 



The gap probability P{wi, W2,c, N) for a fixed c and for large N can be computed exactly in the same way as the index 
distribution in Section 11. Once again we have the optimal charge density with two disconnected supports, one to the left of wi 
and one to the right of W2- Therefore, the general solution in (67) will still be valid with the only exception that in this case the 
edges I2 = wi (the upper edge of the left support) and Li = W2 (the lower edge of the right support) are already fixed. Hence 

wi - L2 , W2- h 




for X e [h^wi] U [W2,L2]. (120) 

It remains to fix the still two unknowns li (the lower edge of the left support) and L2 (the upper edge of the right support). They 
are fixed by the consistency condition (68) which in this case reads 

l + — ^+ f^^j (3/1+ wi+ 2X2-2^2) = (121) 



and the normalization condition J^^ f^{x)d. 



W2 



X = c. 



One then uses this optimal solution to evaluate the saddle point action S[f*{x)] (as in (77)) and compute the associated large 
deviation function $(c, wi, W2) (which now depends on wi and W2) from (78). This gives for large TV 

P{wi,W2,c,N) w cxp [-l3N^'S>{c,wi,W2)] ■ (122) 

Substituting further this result in (119) and evaluating the integral over c by another saddle point one finally gets the gap 
probability for large N 

P(wi,W2,iV) « exp [-/37V^5'(wi, W2)] ; with W2) = $(c*, wi, W2) (123) 

where c* minimizes the function $(c, Wi,W2) over c e [0,1]. Physically the quantity l3N'^'^{wi, W2) just represents the energy 
cost in separating the two blobs of charges by a gap [wi , W2] from their natural Wigner semicircle configuration. 

In principle one can compute the large deviation function vl> [wi , W2 ) for arbitrary [wi,W2] by following the above procedure. 
Here, for simplicity, we present the explicit result for the simple case when the two walls are placed symmetrically around the 
origin: wi ~ —w and W2 = w. In this case, it is evident due to the symmetry that the optimal value must be c* = 1/2. The 
optimal solution in (120) for c = 1/2 is also symmetric around a; = with li = —L and L2 = L and has the simple form 



/i/2(2^) - for e [-L, -w] U [w, L] (124) 

The only unknown L is fixed by the normalization condition f*,^{x)dx = 1/2. This uniquely fixes 



L = Vw^ + 2. 



(125) 
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FIG. 5: Analytical optimal density fx/2{x) in (124) corresponding to a gap over the interval [— with w = 1. The density has two 
disconnected symmetrical supports over [— \/3, — 1] U [1, \/3]- It vanishes at the upper edge L — \/w'^ + 2 = \/3 of the right support and at 
the lower edge —L = —y/i of the left support. At the edges uii = —1 and W2 = 1, the density has an inverse square root divergence. 



A plot of this solution is provided in Fig. (5). Note that when w ^ Q, f*i^{x) = \/2 — x'^/tt reduces to the Wigner semicircle 
as one would expect, because without any constraint the semicircle form is the natural optimal density for c = 1/2. 

Having determined the optimal solution explicitly, we next proceed to compute the large deviation function $(c, —w, w) from 
(78). For this we need to evaluate the second moment /i2 and the two Lagrange multipliers Ai and A2. Using (124) one can 
easily evaluate the second moment 

(126) 



(127) 
(128) 



Ai2 = / X fli^{x)dx +-. 
To fix the Lagrange multipliers, we substitute x ^ L and x = —L in (75) to get two equations 



+ Ai+ A2 



L^+A2 



fl/2Wog{L-x')dx' 
fi/2i^')^og{L + x)dx 



Using the explicit form of f^^ (x) in ( 1 24) it is easy to verify that both integrals on the right hand side are identical and are given 
by 



/ = 



/*/2(.t') log(L - x') dx' - / fl,^[x') \og{L^ - x^) dx' 



1 - loK 2 



Solving these two linear equations, we get 

Ai=0; and A2 =11 - I? = -\ -\o%2 ~ v? . 

Substituting the values of /i2, A\ and A2 in (78) gives a very simple expression 

9 2 
w w 
$(c = 1/2, — w, ui) = — ; hence 'ii{—w,w) = —. 



(129) 



(130) 



(131) 



This leads to the result that the probabiUty that there are no eigenvalues in the interval [~w, w] for a Gaussian random matrix in 
the limit of large N is simply 



P{—w, w, N) « exp 



2 



(132) 
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Note that when w ^ 0, the probability approaches to 1 which is to be expected since without any constraint the system naturally 
settles into the Wigner semicircle which is gapless at the origin. 



V. A FORMULA FOR THE VARIANCE OF THE INDEX AT FINITE N FOR /3 = 2 

So far, we have computed the index distribution in the large limit. From this result, we were able to show that the variance 
of the number of positive eigenvalues 

A{N) ^ {(N+ ~ N/2f) (133) 

increases logarithmically with N to leading order for large as in ( 1 11). A natural question is if one can derive an exact formula 
for the variance for finite N and not just for large N. In this section, we show that at least in the special case /3 = 2, it is possible 
to derive an exact formula for the variance valid at fixed and finite and is given by 



where: 

Zjv(p) — det 



(e-P + (-l)^+.)r(^i±^) 



i,j = l,...,N 



and (.)' denotes differentiation with respect to p. 
In order to prove (134), we start from the pdf (14) : 



and define its moment generating function (Laplace transform) as: 

N 



1=1 j<k 



^n{p) = ^Zn{p) (138) 



where Zn{p) is given in (135). On the other hand, it is easy to see that: 



Combining (138) with (139) we readily obtain (134). 
In order to prove (138), we start from (137): 



We can write the square of the Vandermonde determinant in (140) as: 



with Ak{x) =^ Bk{x) ~ X , and then apply the Andreief identity [40] : 

TV 



(135) 



Zn{p) = / rrrfA,e-^".^?-^5:£.«(A.) tt |^^. _ ^^,,2 (137^ 

J(-oo,oo)« ZLT 

We are going to prove that: 



Zn{p) ^ f f[dA.e-5:r..^-^^-«(^') n |A. - X,f (140) 

J (-00. 00)" ■ 



Y[ \X, - Afe|2 = dct(Afc(A,))dct(Bfc(A,)) (141) 

j<k 



J f[rfM(A,)dct(Afc(Aj))dct(Sfe(Aj)) = iVIdct (^J dtiix)Akix)B,{x)j (142) 
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valid for a benign integration measure ii{x). In our case, we have ^(x) = e ^ p^(^)^ leading to: 



ZAr(p) = iVIdet 



dx I 



(143) 



Evaluating the integral, we get immediately to Eq. (138). 

From this determinantal representation for the variance in Eq. (134), Prellberg [41] noted that the following exact formula for 
the variance of the index for (3 — 2 holds: 



i+i odd 



(144) 



where [xj stands for the greatest integer less or equal to x, and r(a;) is the Gamma function. 

It is convenient to group the terms in the sum for even and odd terms. In this way we can perform one of the sums and we can 
write for even N (144) as: 



N 2 ""^-^ 



(145) 



m=0 



where: 



r(TO + 1/2)2 



1 1 



3 3 1 



r(m)r(m+l)^^n2'2'^'^-"^'2'2'2-"^"^ 



r(m+ l/2)r(m + 3/2) 



1 1 



3 3 1 



Tim + If '^^n 2' 2' ^'-"^=2' 2' 2""^"^ 



(146) 

where ^F^ is a generalized hypergeometric function. As this expression is complicated to the point of being useless (except for 
numerical analyses) we look for an integral representation for t„i. We achieve this by writing the defining series expansion for 
the hypergeometric function, using an integral representation for the gamma functions in its coefficients and then exchanging 
the integral and the sum. The final result is expressed as an integral over a new variable t G [0, 1] as: 



_ Tf_ 1 (m-1/2)! 



dt , 




[tanh-\Vt) + (2m + l)(Li2(Vi) - Li2(-\/i))], 



(147) 



where Li2{z) = J^'kLo W Polylogarithm function. 



0.44 



0.38 




FIG. 6: The variance of the index A(A'') as a function of log(Ar) for P — 2 (dotted, exact finite TV formula in (145); solid, large A'' in (155)). A 
linear fit for the former gives A(N) ~ 0.052 log TV + 0.184. The prefactor 0.052 is in good agreement with the leading theoretical prefactor 
(27r^)~^ ~ 0.051, and the constant correction term 0.184 is also in good agreement with the theoretical constant C in (157). 



Now, we separate the integral !J,„ in two terms. 



7 -7(1) +7(2) 



dt 

VT^ 



tanh ^{Vt) 



dt — 





(2m + 1) Li2(Vt) - Li2(-Vt) 



(148) 
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and we separate further the second integral as 



7(2 



dt- 



dt 



{2m + 1) \U2iVt) - U2{~Vt) - 7rV4 
(2m + 1) \u2{Vt) - Li^i-Vt) - 7rV4 



— (2m+l) 
2(m- 1/2)!' 



(i<- 



The last term, when inserted back in (147), cancels half of the constant 7r^/2 in so we are left with 



_ 7r2 (m-1/2)! 
*™ " T ~ 27rV2^! 



tanh"^(Vt) + (2?7i+ 1) 



Li2(\/i)-Li2(-\/t)- y 



(149) 



(150) 



An integration by parts on the term in the integrand linear in m (considering that dt [Li2(\/t) — Li2(— \/t)] = tanh ^{\/t)/t) 
gives: 



TT^ {m.- 1/2)1 t 
T " 2^1/2^1 ' 



1 - 1 



+ Li2(-\/t) -Li2(Vt) ) -tanh Wt) 



(151) 



We need to sum this expression over m = 0, N/2 — 1 to get A{N). The constant term in tm will cancel against the linear 
term iV/4 in A and a compact integral representation for A{N) can now be obtained exchanging the order of integration over t 
and summation over m: 



where 



N/2-1 



771 — 



(152) 



(153) 



6-1 



where B is the incomplete Euler beta function, defined as B(z; a, b) — dr r°^^ (1 — t) 

This representation turns out to be very useful to pull out the large N logarithmic growth of A(A'^) and the constant term 
(and possibly could yield a complete asymptotic expansion in 1/N). In order to do this we notice that for large N the function 
K{t, N) is concentrated near i = 1. So we expand the remaining integrand to lowest order in 1 — i obtaining the leading order 
and part of the constant term as: 



^ dt Kit, N)il - t)-'/^ ^ ^ logiV + ^(7 + log 2) + 0{N-^). 



(154) 



where 7 = 0.577215... is Euler's constant. 

One can prove that the remaining terms in the expansion in powers of (1 — t) contribute to 0(1) but not to the leading 
logarithm. We can formally lump these terms together and we can write the asymptotic law for A(A^) as: 



A{N) = ^\ogN + C + 0{N-^), 



where the constant C is: 



C= -^(7+log2)+ lim 



1 



2tt' 



dt K{t, N) 



+ Li2(-\/t) - Li2(v^)^ - tanh-^iVt) - ^ 



(155) 



(156) 



Now the limit N ^ 00 can be taken safely inside the integral (Euler's B function goes to zero) and we are left with the following 
nontrivial constant: 



C 



1 



^(7 + log2) + ^ 



1 -2 + + 2< - 4(1 - t) tanh"^ (y/t) - 4Li2(Vt) + 4Li2(-\/t) 



4(1 -ty 



7 + l + 31og2 
27r2 



0.1852484182. 



(157) 



where in the last step we have performed one extra integration by part. The constant C is in good agreement with the fit of 
the finite N results for large (see fig. 6). A careful series expansion of K{t, N) for large N should give the complete 1/A^ 
expansion of A(A^). This is left for future work. 
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VI. CONCLUSIONS 

In summary, we have computed for large N the probability that a Gaussian matrix NxN with real spectrum has a fraction c of 
positive eigenvalues. Using a Coulomb gas method, a large deviation principle for this probability can be formulated. In physical 
terms, the problem amounts to finding the free energy of a system of charged particles repelling each other via a 2d Coulomb 
interaction and confined into a quadratic well, with the constraint that a fraction c of them is kept on the positive semiaxis. Due 
to the long-range nature of the interaction, the free energy is super-extensive in the number of particles, and scales as ~ O(iV^), 
as it is customary in this type of problems. We have computed explicitly the large deviation function <&(c), which quantifies the 
rate of occurrence of unusual fluctuations of the index, for all < c < 1. This function has a minimum at c = 1/2 around which 
it has a quadratic form modulated by a logarithmic singularity. This logarithmic singularity leads to the result that the variance 
of the index displays a logarithmic growth with the matrix size N for all (3. For /3 = 2, we have found a representation of the 
variance in terms of derivatives of a certain Hankel determinant. Based on this representation, Prellberg [41] was able to give 
an explicit expression for the index variance involving a finite double sum. We performed an asymptotic analysis of Prellberg's 
finite N expression, whose leading behavior is precisely ~ (27r^)~^ log(iV), in perfect agreement with our Coulomb gas result. 
In addition, we determined exactly the constant term C in the expansion, which turns out to be a highly non-trivial value as in 
Eq. (157). 

We have also presented a general method to obtain explicitly a two-support solution of a singular integral equation of the form 
(27). This method consists in iterating the single support Tricomi solution twice. We have demonstrated how this method can be 
used to compute the probability of a gap [Ci ,(^2] in the spectrum of the eigenvalues. Given the fact that singular integral equation 
of the type (27) occurs quite generically for other random matrices (such as Wishart matrices [42]), we expect that this method 
will be useful in a broad variety of applications. 
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